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Abstract 



There are often two important types of variation in functional data: the horizontal (or 

phase) variation and the vertical (or amplitude) variation. These two types of variation have 

been appropriately separated and modeled through a domain warping method (or curve 

registration) based on the Fisher Rao metric. This paper focuses on the analysis of the 

, ^, horizontal variation, captured by the domain warping functions. The square-root velocity 

function representation transforms the manifold of the warping functions to a Hilbert sphere. 

^^ Motivated by recent results on manifold analogs of principal component analysis, we propose 

Q\^ to analyze the horizontal variation via a Principal Nested Spheres approach. Compared 

OO with earlier approaches, such as approximating tangent plane principal component analysis, 

this is seen to be the most efficient and interpretable approach to decompose the horizontal 

variation in some examples. 

o 

(^ Keywords: Functional data variability. Time warping. Principal Nested Spheres 



X 1 Introduction 



A common variability in functional data is that prominent features in the functions vary in position 
from one sample to another, such as the timing variation of the adolescent growth spurt in human 
growth curves. This positional, or phase, variation is called the horizontal variation. Another 
important component of variability in functional data is the amplitude variation, or the vertical 
variation, such as the height difference among the individuals. 

There is a large literature on statistical analysis of functions, such as Kneip and Gasser (1992) 
[6], Locantore et al (1999) [8j. A general overview of functional data analysis is provided by Ramsay 



and Silverman (2002 [13] and 2005 [12] )• Plenty of useful tools and methods are available, such 
as the Functional Principal Component Analysis (FPCA), with many important applications in 
a wide variety of scientific fields. One open problem in functional data analysis is that, in those 
traditional approaches, the functional data are analyzed under the L^ metric, which tends to 
strongly focus on the vertical variation. The horizontal variation cannot be easily understood in 



these vertical analyses. Section |2.1| gives examples illustrating the shortcoming of conventional 
FPCA. 

The main purpose of this paper is to find an improved method for the horizontal analysis, 
i.e. the analysis of the horizontal variation. Considering the special spherical structure of the 



horizontal variation (see Section 1.2 for details), we propose to use an approach involving Principal 
Nested Spheres (PNS) introduced by Jung et al (2010) [4J. Comparison with several other popular 
approaches, such as the FPCA, suggests improved efficiency of PNS for horizontal analysis. A toy 
example (the left panel of Figure [I]) is used to illustrate the advantages of the PNS approach. 

We find Object Oriented Data Analysis, introduced by Wang and Marron (2007) [17], very 
helpful in studying horizontal variation. The data objects are understood as the atoms of the 
analysis. In this study, they are functions. Several different types of data objects (or functions) 
are considered in this paper for horizontal analysis. These different choices of data objects and 
the corresponding horizontal analyses are discussed in Section |2} 

1.1 Function Alignment Based on the Fisher Rao Metric 

A useful approach to horizontal analysis is through the idea of elastic functions. Some pioneering 
work in this area includes Ramsay and Li (1998) [11], Gervini and Gasser (2004) [3j, Liu and 
Mueller (2004) j^, Kneip and Ramsay (2008) [5J, Tong and Mueller (2008) [IS]. The basic idea 
is to first separate the vertical and the horizontal variation through function alignment, or curve 
registration. In particular, consider a collection of functions fi{t), i = 1,2,.. .,n, having both 
vertical and horizontal variation, such as the bimodal functions shown in Figure [I] (left). If these 
functions are well aligned by warping the domain properly, then the horizontal and the vertical 
variation can be separately captured by the domain warping functions 7i(t) and the resulting 
ahgned functions /j(7i(t)), respectively. For this toy example, such a set of warping functions and 
the corresponding aligned functions are shown in the middle and right panels respectively (details 



about finding those warping functions are discussed later). Tlien, tlie liorizontal analysis can be 
done by studying those warping functions. 
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Figure 1: Left: A toy example of biinodal functions with big horizontal variation. The color 
reflects the order of the horizontal positions of the peaks. Middle: The domain warping functions 
to align the functions based on the Fisher Rao metric. Right: The aligned functions. 



A crucial step in the function alignment is to find appropriate domain warping functions. 
Consider two functions /i and /2. Most of the past approaches involve solving inf^gr || /i — (/2 07) || 
to align /2 to /i, where || ■ || is the standard L^ metric, i.e. ||/|| = (Jq |/(t)p(it)^/^. However, this 
criterion is problematic, since the objective function is not symmetric in the sense that aligning 
/i to /2 leads to a different optimal minimum. To illustrate this point. Figure [2] shows a simple 
example of aligning two step functions. It is seen that aligning /2 to /i (middle) and aligning /i 
to /2 (right) are different under the L^ metric. The difference between the horizontally hatched 
blue area in Panel (2, 2) and the vertically hatched pink area in Panel (2, 3) indicates that the 
two corresponding objective functions ||/i — (/2 o 7)|| and ||(/i 07) — /2II are not equal. This is 
because the L^ metric is not invariant under re-parameterization, or domain warping. In particular, 
ll/i ~ /2II 7^ ll/i ° 7 — /2 ° 7||- A more appropriate metric is the Fisher Rao metric. See Srivastava 
et al (2011) [T3] for definition and relevant theory. This metric is derived from a Riemannian 
metric first introduced by C R. Rao (1945) fT^. A nice property of the Fisher Rao metric is that 
it is warping-invariant. In fact, Cencov (1982) [Ij proved that it is the only metric that has this 
property. Thus, we propose to use the Fisher Rao metric to align functions for the purpose of the 
horizontal analysis. 

Calculations based on the Fisher Rao metric are difficult. In practice, a convenient square-root 



velocity function (SRVF) representation, i.e. transforming the function f{t) to 



fit) 



simplifies 



the Fisher Rao framework. Under the SRVF representation, the Fisher Rao metric becomes the 
standard L^ metric, and thus, standard statistical tools for the L^ space, such as mean, covariance 
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Figure 2: The problem with L^ metric ahgnment. The top left panel shows two step functions 
/i (solid red) and /2 (dashed blue). The four right panels show that warping /2 to /i (middle) is 
different from warping f\ to f^ (right) under the L^ metric. The top two panels show the warping 
functions, while the bottom two panels show the aligned functions. Better than either is the Fisher 
Rao alignment shown in the bottom left panel, where the black dotted line indicates the Karcher 
mean function. 

and principal components, can be used. As an example of function alignment based on the Fisher 
Rao metric, the warping functions (middle) for the toy data (left) in Figure [I] are found by an 
automatic and unsupervised approach based on this metric, proposed by Srivastava et al (2011) 



1.2 PNS for Spherical Structure of Horizontal SRVFs 

One major benefit of the SRVF representation is that it transforms the manifold of the warping 
functions to a Hilbert sphere. In particular, suppose the domain of the functions /«, i = 1, 2, ..., n, 
to be [0,1] (if not, consider a linear transformation that maps the domain to [0,1]). Let ji 
be a warping function for /j, i.e. 7j G F = {7 : [0,1] — t- [0,1] | 7(0) = 0, 7(1) = 1 and 7 
is a diffeomorphism} , where a diffeomorphism refers to a bijective differentiable function whose 
inverse is also differentiable. Then the SRVF ipi of the warping function 7^, referred to later as 
the horizontal SRVF, can be written as a/t^. Noting that WipiW^ = Jq 4'i{t)'^dt = j^ 'ji{t)dt = 
7j(1) — 7j(0) = 1, these horizontal SRVFs naturally lie on the surface of a Hilbert unit sphere. 
Due to the spherical structure of the horizontal SRVFs, we propose to use the PNS method for 



horizontal analysis. Introduced by Jung et al (2010) [1], PNS is an extension of PCA for curved 
manifolds, especially for high dimensional spheres. This method finds a sequence of subspheres that 
best approximates the data using a backward approach, which starts with the high dimensional 
sphere and finds the best fitting subsphere of one dimension lower at each step. See Marron et 
al (2010) [9] for more discussion of backward PCA. It has been shown in a number of cases that 
PNS can provide more effective analysis of manifold data than many other analogous approaches. 
See Pizer et al (2011) |10j for such an example in the study of 3D shapes. 

For comparison purposes, another popular approach for data lying in curved manifolds. Prin- 
cipal Geodesic Analysis (PGA) proposed by Fletcher et al (2004) [2], is also investigated in this 
paper. Unlike PNS, it is a forward approach, starting with the Karcher mean (also called frechet 
or geodesic mean). In the following horizontal analysis, the Karcher mean refers to the representer 
defined in Definition 3 of Srivastava et al (2011) [15j. PGA approximates the spherical surface by 
a tangent hyperplane centered at the Karcher mean. By performing PCA on this tangent plane, 
PGA finds the principal geodesies (i.e. great spheres) passing through the mean that best fit the 
data. 

In contrast to PGA, the PNS method finds the best fitting subsphere regardless of whether it is 
a great sphere or not. When the major variance is non-geodesic, PNS tends to find the best-fitting 
small spheres instead of only great spheres. Thus, when the data variability on the sphere is big 
enough, the PNS can give a much more effective decomposition of this variability than PGA. On 
the other hand, if the data variability is small, the PNS method does not improve much over 
the PGA method. This is because in this case the data do not have much curvature and can be 
approximated by a tangent plane well enough. In the following discussion, we focus on examples 
with big horizontal variation. 

2 Horizontal Analyses 

This section compares different horizontal analyses with the toy example in Figure [I] (left), where 
the functions have big horizontal variation and the PNS method gives very useful improvement 
over PGA. 

Figure [3] (left) visualizes the pure horizontal shifts of the peaks for this toy example, which 
is done via warping the Karcher mean function (red curve in the right panel) by the Fisher Rao 



warping functions in Figure [T] (middle). These functions will be referred to later as the horizontally 



shifted functions, denoted by hi 
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Figure 3: Left: Horizontal variation of the toy example in FigureU] (left). The color reflects the 
order of the horizontal positions of the peaks. Right: The Karcher mean function (red solid line) 
and the cross-sectional mean (blue dashed line). 



Three different types of data objects for horizontal analysis are studied in this section: the 
horizontally shifted functions hi, the warping functions 7j and the horizontal SRVFs ipi. Oriented 
by the choice of data objects, four different horizontal analyses have been performed on the toy 
data: (1) FPCA of the horizontally shifted functions hi] (2) FPCA of the warping functions 7^; (3) 
PGA of the horizontal SRVFs ipf, (4) PNS of the horizontal SRVFs tpi. The first two approaches, 



using the conventional FPCA, are discussed in Section 2.1 The latter two manifold approaches 



motivated by the spherical structure of the horizontal SRVFs, are discussed in Section [2^ Section 
|3] summarizes the comparison of these four approaches. 



2.1 Conventional FPCA 

An intuitive way to understand the horizontal variation of the toy data is to analyze either the 
horizontally shifted functions hi or the warping functions 7^. As the FPCA is one of the most 
widely used statistical tools for functional data analysis, this section discusses FPCA of the two 
different types of functions respectively. It is seen that the conventional FPCA is rarely a good 
option for horizontal analysis. 

2.1.1 FPCA of Horizontally Shifted Functions 

FPCA involves centering data with the cross-sectional mean based on the L^ metric. However, in 
the case of big horizontal variability, this cross-sectional mean may hardly be useful in capturing 
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the underlying shape of the functions. In the toy example, the unimodal cross-sectional mean (the 
blue dashed line in the right panel of Figure [s]) is a poor representative of the bimodal functions 
hi, while the bimodal Karcher mean (the red solid line) based on the Fisher Rao metric is much 
more satisfying. Thus, the resulting PC projections from the FPCA of the horizontally shifted 
functions hi are difficult to interpret. See panels in the first column of Figure |4] for the first two PC 
projections. Therefore, the FPCA of the hi is not an appropriate approach for horizontal analysis. 
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Figure 4: Horizontal analyses of the toy data. The color is consistent with that in the first 
panel in Figure\^ From the left column to the right: FPCA of the horizontally shifted functions, 
FPCA of the warping functions, PGA of the horizontal SRVFs, PNS of the horizontal SRVFs. 
Each column shows the first two components of each analysis. Note that successive improvement 
in quality of data representation and signal compression are shown. 



2.1.2 FPCA of Warping Functions 

One challenge of performing the FPCA on the domain warping functions is how to interpret each 
component. Better interpretation comes from transforming the decomposition of the warping 
functions into the original function space, i.e. warping the Karcher mean function by the PC 
projections. The second column of Figure |4] shows the first two transformed PC projections for 
the toy example. These two components provide a much more useful summary of the apparent 
horizontal variation in the raw data than the previous ones from the FPCA of the hi (first column). 



The first component reflects the horizontal shifts of the peaks, while the second one is about the 
horizontal distance between the two peaks. 

However, this approach has a serious weakness. That is, the PC projection of a warping func- 
tion is not necessarily bijective, and thus, not a warping function. In other words, the conventional 
FPCA leaves the space F of warping functions. To illustrate this. Figure [5] shows the FPCA of 
a set of simple two-dimensional warping functions 7^ (left panel), each of which is determined by 
two values, 7j(1/3) and 7j(2/3). It is seen in the right two panels that some of the PCI projections 
(cyan) have a decreasing part, i.e. 7j(1/3) > 7^(2/3). Warping the Karcher mean function with 
these non-warping PC projections is problematic. Computationally, this causes the wiggly right 
end of the yellow and the red functions in Panel (1, 2) of Figure |4J 
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Figure 5: A toy example to illustrate the problem of performing the FPCA on warping functions. 
Left: A set of two-dimensional warping functions ji, each determined by 7j(1/3) and 7j(2/3). 
Middle: The scatter plot of 7j(1/3) and 7^(2/3) (blue circles), and the PCI direction (red line) 
of these points. The red crosses indicate the PCI projections above the black diagonal line, 
and the cyan triangles indicate the PCI projections below the line. Right: The projected curve 
visualization of those PCI projections. Note that the cyan curves are not bijective, i.e. not valid 
warping functions. 



2.2 Analyses on SRVF Manifold 



The following analyses avoid the problem shown in Section 2.1.2[ by appropriately using the 
spherical structure of the horizontal SRVFs tpi. The idea is to first decompose the variability 
of the spherical SRVFs and then transform the projections of the SRVF components back to 
the warping function space F using the formula 75 (t) = /q ^(t)'^dt, where ^ is a point on the 
SRVF sphere. It can be easily checked that 7^ G F. Finally, the decomposition of the horizontal 
variation of the original functions can be obtained via warping the Karcher mean function with 
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the transformed SRVF projections. The following discussion shows how manifold approaches, 
especially the PNS approach, can work better for the horizontal analysis than the conventional 
FPCA. 

2.2.1 PGA of Horizontal SRVFs 

The third column in Figure |4] shows the first two components of the horizontal variation in the toy 
data, based on the PGA of the horizontal SRVFs. Compared with the previous FPCA results (the 
first two columns), this approach gives a much better decomposition of the horizontal variation. 
The first component captures most of the horizontal shifts of the peaks. The second one is similar 
to the PC2 of the warping functions in Panel (2, 2), but has a visually smaller horizontal variation. 
This shows more of the underlying signal in the data has been moved to PGl, i.e. better signal 
compression. 

2.2.2 PNS of Horizontal SRVFs 

The first two components of the horizontal variation in the toy data based on the PNS of the 
horizontal SRVFs are shown in the fourth column in Figure |4} Results from this decomposition 
give more signal compression than those from the previous analyses. The first component simul- 
taneously captures both the mode of peak location and the mode of distance between peaks. The 
two components previously needed have been reduced to one. Among the four panels in the first 
row, these PNSl projections explain the horizontal variation of the original bimodal functions 
best, as they are almost identical to the raw horizontal warps of the Karcher mean, shown in the 
left panel of Figure |3j Very little variability is left for the second PNS component to explain. 
This suggests that the horizontal variability is almost one dimensional in some sense, which is 
consistent with the fact that the warping function 7^ in this toy example can be summarized by a 
single parameter a^. In particular, these were generated as 7j(t) = "^l'. zl , for Oj G [—5, 5]. 

For further insight of this type. Figure [6] shows the score scatter plot of the first two PNS 
components (third panel), which has a similar pattern to that of the scatter plot of the second 
and the third PC scores (second panel). This is because most of the horizontal SRVF variation 
is along some small circle, which is captured by the PNSl. However, PGA needs two principal 
geodesies (the first two; see the first panel for the corresponding score scatter plot) to capture the 
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curvature of this small circle. In other words, the first two PNS components are able to explain 
the data variability captured by the first three components in the PGA. Thus, the PNS approach 
gives better signal compression by using more flexible components. 
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Figure 6: The Grst two panels show scatter plots of the first three tangent PC scores in the 
PGA of the horizontal SRVFs. The third panel shows scatter plot of the first two PNS scores. 
The aspect ratio of these plots is 1. The color is consistent with that in Figure [sl (left). The 
right panel shows scree plots of four horizontal analyses, showing both the individual (solid line) 
and the cumulative (dashed line) proportion of variance explained by the first three components. 
The text indicates the names of the analyses, and the order of the names (from top to bottom) is 
consistent with the order of the proportion of variance explained by the first component (the first 
point). This plot shows improved signal compression by the more sophisticated methods. 



The right panel in Figure [6] visualizes the proportion of variance explained by the first three 
components in each of the four analyses. It is seen that the PNS of the horizontal SRVFs (red, 
with the highest first point) and the FPCA of the warping functions (cyan, with the second 
highest first point) are the top two most efficient approaches, in the sense of explaining a higher 
proportion of data variability using a lower number of components. However, considering the 



difficulty in interpreting the FPCA decomposition (see Section 2.1.2), PNS is the best approach 
for the horizontal analysis. 



3 Conclusions 

This paper aimed at finding an appropriate method for horizontal analysis of functional data, 
where the horizontal variation is separated from the vertical variation using a domain-warping 
method based on the Fisher Rao metric. Four different approaches have been discussed, including 
two conventional FPCA approaches (FPCA of the horizontally shifted functions of the Karcher 
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mean and FPCA of the warping functions) and two manifold approaches based on the spherical 
structure of the horizontal SRVFs (PNS and PGA). A toy example of big horizontal variation 
is used to compare these approaches. The manifold approaches are generally better than the 
FPCA approaches, and the PNS works the best in terms of both the signal compression and the 
interpretability of the results. As a conclusion, we propose to use the PNS approach for horizontal 
analysis, especially when the horizontal variability is large. 
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